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Abstract: : The problem of curve registration appears in many different areas of applications 
ranging from neuroscience to road traffic modeling. In the present work, we propose a nonpara- 
metric testing framework in which we develop a generalized likelihood ratio test to perform 
curve registration. We first prove that, under the null hypothesis, the resulting test statistic 
is asymptotically distributed as a chi-sguared random variable. This result, often referred to 
as Wilks' phenomenon, provides a natural threshold for the test of a prescribed asymptotic 
significance level and a natural measure of lack-of-fit in terms of the p-value of the / 2 -test. 
We also prove that the proposed test is consistent, i.e., its power is asymptotically equal to 
1. Some numerical experiments on synthetic datasets are reported as well. 



Introduction 

Boosted by applications in different areas such as biology, medicine, computer vision and road 
traffic forecasting, the problem of curve registration and, more particularly, some aspects of 
this problem related to nonparametric and semiparametric estimation, have been explored in 
a number of recent statistical studies. In this context, the model used for deriving statistical 
inference assumes that the input data consist of a finite collection of noisy signals possessing 
the following feature: Each input signal is obtained from a given signal, termed mean template 
or structural pattern, by a parametric deformation and by adding a white noise. In what follows, 
we will refer to this as the "deformed mean template" model. The main difficulties for developing 
statistical inference in this problem are caused by the nonlinearity of the deformations and the 
fact that not only the deformations but also the mean template that was used to generate the 
observed data are unknown. 

While the problems of estimating the mean template, the deformations and some other re- 
lated objects have been thoroughly investigated in recent years, the guestion of the adeguacy 
of modeling the available data by the aforementioned semiparametric model has received lit- 
tle attention. By the present work, we intend to fill this gap by introducing a nonparametric 
goodness-of-fit testing framework that allows us to propose a measure of appropriateness of a 
deformed mean template model. To this end, we focus our attention on the case where the only 
allowed deformations are translations and propose a measure of goodness-of-fit based on the 
p-value of a chi-sguared test. 

In full generality, the mathematical formulation of the "deformed mean template" model is the 
following. We are given a sample of size n of noisy signals {Y m ;m = 1,...,n} having common 
structural pattern f, that is 

dY m (x) = f(<t>(x,T m ))dx + a m dW m (x), xG[0,lf, m = 1 n, (1) 

1 
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where Is a known function determining the type of the deformation and r m Is a finlte- 
dlmenslonal parameter allowing to Instantiate the deformation. Typical examples are: 

(a) Shifted curve model cf>(x, r) = x — t, where r G l rf is the shift parameter, 

(b) Periodic signal model cf>(x, r) = tx, where the signal Ms a 1 -periodic univariate function, 
x Is one-dlmenslonal and r G R is the period of the noise-free signal, 

(c) Rigid deformation model 0(x, r) = s(Rx + 1), where r = (s, R, t) with s > being the scale, 
R being the rotation and t G R rf being the translation. 

Starting from Golubev [23] and Knelp and Gasser [30], semlparametrlc and nonparametric es- 
timation In different Instances of problem (1) have been Intensively Investigated, see for In- 
stance [4,11,13-15,19,40,42] for the shifted curve model and [9,25,43] for a slightly extended 
case of affine transforms of shifted curves, [10,12] for the periodic signal model and [6] for the 
rigid deformation model. More general deformations have been considered In [7,21,29,37,38] 
with applications to Image warping [5,22]. 

Let us assume now that a collection of sample curves { V m ; m = 1 , . . . , n} Is available. Prior to 
estimating the common template, the deformations or any other object Involved In (1), It Is natural 
to check the appropriateness of model (1 ). The aim of the present work Is to develop a theoretically 
justified approach for carrying out such kind of tests. To achieve this goal, we consider the 
particular case of shifted curve model or the slightly more general affinely transformed shifted 
curve model with n = 2, i.e., the case where two functions V and Y* are observed such that 

dY(x) = f(x)dx + odW(x), dY*(x) = f*(x)dx + a*dW*{x), Vxg[0,1], (2) 

where W and W* are two Independent Brownlan motions, f and f* are two unknown 1 -periodic 
signals and a, a* > are positive parameters representing the noise magnitude. The hypothesis 
we wish to test Is that the curves f and f* coincide, up to a scale change, a shift of the argument 
and to a vertical translation: 

H : there exists some (a*, b* , r*) G R 2 x [0, 1] s. t. f(x) = a*f*(x + r*) + b* , Vx G [0, 1]. (3) 

If the null hypothesis Ho Is accepted, then we are In the setting of model (1) for the particular 
case of deformation given by a shift. Even If the shifted curve model seems to be a very narrow 
subclass of models given by (1), It plays a central role In several applications. To cite a few of 
them: 

ECG interpretation: An electro-cardiogram (ECG) can be seen as a collection of replica of nearly 
the same signal, up to a time shift. Significant informations about heart malformations or 
diseases can be extracted from the mean signal If we are able to align the available curves, 
while the deflections would not be so correctly Identified If we simply consider the mean of 
the shifted curves. For more details we refer to [42], where random shifts are considered, and 
they are estimated along with their common distribution In the asymptotlcs of a growing 
number of curves. 

Road traffic forecast: In [32], a road traffic forecasting procedure is introduced. For this, archetypes 
of the different types of road trafficking behavior on the Parisian highway network are built, 
using a hierarchical classification method. In each obtained cluster, the curves all represent 
the same events, only randomly shifted In time. The mean of the unshlfted curves Is more 
significant of a given behavior than the mean of the shifted ones, and hence provides more 
efficient predictions. 

Keypoint matching: An important problem In computer vision Is to decide whether two points 
In a same image or in two different images correspond to the same real-world point. The 
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points in images are then usually described by their local neighborhoods. More precisely, 
the regression function of the magnitude of the gradient over the direction of the gradient 
of the image restricted to a given neighborhood is considered as a local descriptor (cf. the 
SIFT descriptor [33]). The methodology we shall develop in the present paper allows to 
test whether two points in images coincide, up to a rotation and an illumination change, 
since a rotation corresponds to shifting the argument of the regression function by the 
angle of the rotation. 

The problem of estimating the parameters of the deformation is a semiparametric one, since the 
deformation involves a finite number of parameters that have to be estimated by assuming that the 
unknown mean template is merely a nuisance parameter. In contrast, the testing problem we are 
concerned with is clearly nonparametric. Indeed, both the null hypothesis and the alternative in 
the context of the present study are nonparametric, i.e., the parameter describing the probability 
distribution of the observations is infinite-dimensional not only under the alternative but also 
under the null hypothesis. Surprisingly, the statistical literature on this type of testing problems 
is very scarce. Indeed, while [36] and [26] analyze the optimality and the adaptivity of testing 
procedures in the setting of a parametric null hypothesis against a nonparametric alternative, 
to the best of our knowledge, the only papers concerned with nonparametric null hypotheses 
are [1,2] and [20]. Unfortunately, the results derived in [1,2] are inapplicable in our set-up 
since the null hypothesis in our problem is neither linear nor convex. The set-up of [20] is 
closer to ours. However, they only investigate the minimax rates of separation without providing 
the asymptotic distribution of the proposed test statistic, which generally results in an overly 
conservative testing procedure. Furthermore, their theoretical framework comprises a condition 
on the sup-norm-entropy of the null hypothesis, which is irrelevant in our set-up and may be 
violated. 

We adopt, in this work, the approach based on the Generalized Likelihood Ratio (GLR) tests, 
cf. [17] for a comprehensive account on the topic. The advantage of this approach is that it 
provides a general framework for constructing testing procedures which asymptotically achieve 
the prescribed significance level for the first kind error and, under mild conditions, have a power 
that tends to one. It is worth mentioning that in the context of nonparametric testing, the use of 
the generalized likelihood ratio leads to a substantial improvement upon the likelihood ratio, very 
popular in parametric statistics. In simple words, the generalized likelihood allows to incorporate 
some prior information on the unknown signal in the test statistic which introduces more flexibility 
and turns out to be crucial both in theory and in practice [18]. 

We prove that under the null hypothesis the GLR test statistic is asymptotically distributed as 
a x 2 -random variable. This allows us to choose a threshold that makes it possible to asymptoti- 
cally control the test significance level without being excessively conservative. Such results are 
referred to as Wilks' phenomena. In this relation, let us quote [17]: "While we have observed the 
Wilks' phenomenon and demonstrated it for a few useful cases, it is impossible for us to verify the 
phenomenon for all nonparametric hypothesis testing problems. The Wilks' phenomenon needs 
to be checked for other problems that have not been covered in this paper. In addition, most of 
the topics outlined in the above discussion remains open and are technically and intellectually 
challenging. More developments are needed, which will push the core of statistical theory and 
methods forward." 

The rest of the paper is organized as follows. After a brief presentation of the model, we 
introduce the GLR framework in Section 1. The main results characterizing the asymptotic be- 
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havior of the proposed testing procedure, based on GLR testing for a large variety of shrinkage 
weights, are stated In Section 3. Some numerical examples Illustrating the theoretical results 
are Included In Section 4. The proofs of the lemmas and of the theorems are postponed to the 
Appendix. 

1. Model and notation 

In the following, we consider the curve registration problem In which the data {Y(x) : x £ [0, 1]} 
and {Y*(x) : x G [0, 1]} are available, generated by the Gaussian white noise model 

dY{x) = f(x)dx + odW(x), dY*{x) = f*(x) dx + a*dW*(x), (4) 

where (W, W*) is a two-dimensional Brownlan motion. (It Is Implicitly assumed that f and f* 
are sguared Integrable, which makes model (4) sensible.) This model Is often seen as a prototype 
of nonparametric statistical model, since It Is asymptotically egulvalent to many other statistical 
models [8,16,24,34,39] and It captures main theoretical difficulties of the statistical Inference. 
Let us consider, for the moment, that the noise magnitudes a and a* are known and focus on the 
hypotheses testing problem stemming from the curve registration set-up. Prior to switching to 
the definition of the generalized likelihood ratio tests, let us recall that model (4) Is egulvalent 
to the Gaussian seguence model obtained by projecting the processes Y(-) and Y*(-) onto the 
Fourier basis: 

Yj = cj + aej, Yf = cj + a*ej, j = 0,1,2 (5) 

where cj = f Q f(x) e 2 ^ 7XX dx and cj = J Q 1 f*(x) e 2y7r * dx are the complex Fourier coefficients. The 
complex valued random variables ey, e* are L.i.d. standard Gaussian: ej.cj ~ JVc(0, 1), which 
means that their real and Imaginary parts are Independent J\f(0, 1 ) random variables. In what 
follows, we will use boldface letters for denoting vectors or Infinite seguences so that, for example, 
c and c # refer to {cf,j = 1,2,...} and {cj; j — 1 , 2, . . .}, respectively. 

We are Interested In testing the hypothesis (3), which translates In the Fourier domain to 

H : 3 (a* , f *) 6 1 x [0, 2n[ s. t. C] = a*e-' l ' f *cj Vy = 1 , 2 (6) 

Indeed, one easily checks that the projection onto the functions e 2y7r * cancels the term b* In 
(3), resulting In (6) with t* = 2ttt*. Furthermore, if (6) is verified, then b* can be recovered by 
the formula b* = Co — o*c*. If no additional assumptions are Imposed on the functions f and 
f* , or egulvalently on their Fourier coefficients c and c* , the nonparametric testing problem 
has no consistent solution. A natural assumption widely used In nonparametric statistics Is that 
c = (co, ci , . . .) and c* = (c*, c*, . . .) belong to some Sobolev ball 

Tsx = {u = (u , (71, . . .) : Yi V' 2 >y| 2 < ^J- 

7=1 

where the positive real numbers s and L stand for the smoothness and the radius of the class 
T s ,l- ^ is a L S0 possible to consider other smoothness classes, as for Instance Besov bodies, in 
which case It would be more appropriate to project not onto the Fourier basis but onto the 
wavelet basis, as It Is done In [41]. 
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In order to convey the main Ideas underlying the GLR test analyzed In the present work, we 
focus on the case where the null hypothesis corresponds to the spatially shifted curve model. 
This means that In the rest of this section we assume that 

H : ] r* G R x [0, 2n[ s.t. cj = e~ l i r c* Vy = 1 , 2 . 

Wi : lnf T \cj - e-'v' T cf p > p (7) 

for some p > 0. In other terms, under Ho the graph of the function f* Is obtained from that of f 
by a translation. 

Because of the Gaussian nature of the noise, the negative log-likelihood of the parameters 
u*-* = (u,u*) given the data Y''* = (Y , Y*) is 

e(Y''*, «••*) = ^- 2 \\ Y - ug + ^ || Y* - u*\\i (8) 

To present the penalized likelihood ratio test, which Is a variant of the GLR test, we Introduce 
a penalization in terms of weighted £ 2 -norm of u ,,# . In this context, the choice of the £ 2 -norm 
penalization is mainly motivated by the fact that Sobolev regularity assumptions are made on 
the functions f and f*. For a seguence of non-negative real numbers, to, we set 

p^ ( r^u-' # ) = ^(||y-u||i + ^a,> y | 2 ) + ^(||y # - U *|| 2 + ^a> y | U f| 2 ). (9) 

y>i y>i 

The penalized likelihood ratio test is based on the test statistic 

A(Y''*)= mln p6(Y m -* , u*' # ) - mln p£{Y K * , u''*). (10) 

u*'*:Ho is true u* # 

It Is clear that A(V" ,# ) Is always non-negative. Furthermore, it is small when Ho Is satisfied and 
Is large If Ho Is violated. The minimization of the guadratlc functional (9) Is an easy exercise 
and leads to 



/#|2 

y>l 1 " ' y>l 

Similar but a bit more Involved computations lead to the following simple expression: 

A(Y''*)= ^,\J mln V '—. (11) 

2{aa*) 2 r&[o,2n] j-^ 1 + coj 

From now on, it will be more convenient to use the notation Vj = 1/(1 + loj). The elements of the 
seguence v = [vf, j > 1} are hereafter referred to as shrinkage weights. They are allowed to 
take any value between and 1. Even the value will be authorized, corresponding to the limit 
case when wj = +oo, or eguivalently to our belief that the corresponding Fourier coefficient is 
0. To ease notation, we will use the symbol o to denote coefficlent-by-coefficient multiplication, 
also known as the Hadamard product, and e(r) will stand for the seguence (e _LT , e _2lT , . . .). The 
test statistic can then be written as: 

2(aa*) z tg[q,2tt] 

and the goal is to find the asymptotic distribution of this guantlty under the null hypothesis. 



A (^' # )= o, _ m i" AY-e(r)°Y*\\l v , (12) 
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3. Main results 



The test based on the generalized likelihood ratio statistic involves a seguence v, which is 
completely modulable by the user. However, we are able to provide theoretical guarantees only 
under some conditions on these weights. To state these conditions, we focus on the case a = a* 
and choose a positive integer N = N a > 2, which represents the number of Fourier coefficients 
involved in our testing procedure. In addition to reguiring that < Vj < 1 for every j, we assume 
that: 



(A) v^=^, and vj = 0, Vy > N a . 



(B) for some positive constant c, it holds that vj > cN a . 

Moreover, we will use the following condition in the proof of the consistency of the test: 

(C) 3 c > 0, such that min{y > 0, Vj < c} — » +oo, as a — > 0. 

In simple words, this condition implies that the number of terms vj that are above a given 
strictly positive level goes to +oo as a converges to 0. If N a —> +oo as a — > 0, then all the 
aforementioned conditions are satisfied for the shrinkage weights v of the form i/ y+ i = h(j/N a ), 
where b : K — > [0, 1] is an integrable function, supported on [0, 1], continuous in and satisfying 
h(0) = 1. The classical examples of shrinkage weights include: 



1 



{j<Na} 



(projection weight) 
k > 0, /J > 1, (Tikhonov weight) 
jj > 0. (Pinsker weight) 



(13) 



Note that condition (C) is satisfied in all these examples with c = 0.5, or any other value in 
(0, 1). Here on, we write A CT (V*' # ) instead of A(Y* ,# ) in order to stress its dependence on a. 

Theorem 1 . Let c G and \ c\ \ > 0. Assume that the shrinkage weights vj are chosen to satisfy 
conditions (A), (B), N a — > +oo and o 2 N^ 2 log(/V CT ) = o(1). Then, under the null hypothesis, the 
test statistic A a (Y*'*) is asymptotically distributed as a Gaussian random variable: 



A ff (V"- ff )-4Hh 
4|Mb 



AT(0,1] 



(14) 



The main outcome of this result is a test of hypothesis Ho that is asymptotically of a prescribed 
significance level a G (0, 1). Indeed, let us define the test that rejects Ho if and only if 

A ff (r' # )>4|| V || 1 +4z 1 _ a || V || 2 , (15) 
where zi_ a is the (1 — a)-guantile of the standard Gaussian distribution. 

Corollary 1. The test of hypothesis Ho defined by the critical region (15) is asymptotically of 
significance level a. 
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Remark 1. Let us consider the case of projection weights vj = t(j < N a ). One can reformu- 
late the asymptotic relation stated In Theorem 1 by claiming that jA a (Y* ,# ) Is approximately 
M[2N a , / \N a ) distributed. Since the latter distribution approaches the chi-squared distribution, 
we get: 

\^{Y-'*) ~ X 2N„. as o^O. 

In the case of general shrinkage weights satisfying the assumptions stated In the beginning of 
this section, an analogous relation holds as well: 

II V 111 A P. 



A a (Y''*) « X 2 2M 2 IM 2, as a^O. 



This type of results are often referred to as Wllks' phenomenon. 

Remark 2. The p-value of the aforementioned test based on the Gaussian or chl-squared ap- 
proximation can be used as a measure of the goodness-of-fit or, in other terms, as a measure 
of alignment for the pair of curves under consideration. If the observed two noisy curves lead to 
the data y* ,# , then the (asymptotic) p-value is defined as 



a* = *( 



A g (t/'- # )-4H|i 
4||vlb 



where stands for the c.d.f. of the standard Gaussian distribution. 

So far, we have only focused on the behavior of the test under the null without paying 
attention on what happens under the alternative. The next theorem fills this gap by establishing 
the consistency of the test defined by the critical region (15). 

Theorem 2. Let condition (C) be satisfied and let o^N„ tend to as a — > 0. Then the test 
statistic J a = Ag ^4||v|j" 2 4 ^ 1 diverges under H\, i.e., 



p 

T„ — ^ +oo, as o — > 0. 



In other words, the result above claims that the power of the test defined via (15) Is asymp- 
totically equal to one as the noise level a decreases to 0. 

Remark 3. The previous theorem tells us nothing about the (mlnlmax) rate of separation of 
the null hypothesis from the alternative. In other words, Theorem 2 does not provide the rate 
of divergence of T„. However, a rate Is present In the proof (cf. Section A.3). In fact, in most 
situations min{y > 1;y < c} Is on the order N a , In which case we prove that 

c P + 0[N- 2 ) + Q P {a^N- a ) 

as a — > 0. This Implies that, for Instance, If N a — > +oo and satisfies Oy/N a = 0(1) then T„ tends 
to Infinity If and only If pl(o^/loq N a ) —> oo. This argument can be made rigorous to establish 
that the mlnlmax rate of separation Is at least c 1/2 (log ff _1 ) 1/4 . However, we will not go Into the 
details here since we believe that this rate Is not optimal and Intend to develop the mlnlmax 
approach in a future work. 
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4. Numerical experiments 

We have implemented the proposed testing procedure (15) in Matlab and carried out a certain 
number of numerical experiments on synthetic data. The aim of these experiments is merely 
to show that the methodology developed in the present paper is applicable and to give an 
illustration of how the different characteristics of the testing procedure, such as the significance 
level, the power, etc, depend on the noise variance a 2 and on the shrinkage weights v. Following 
the philosophy of reproducible research, we intend to make our code available for free download 
on the authors homepages. 

4. 1. Convergence of the test under Ho and the influence of the shrinkage weights 

In order to illustrate the convergence of the test (15) when a tends to zero, we made the following 
experiment. We chose the function HeaviSine, considered as a benchmark in the signal processing 
community, and computed its complex Fourier coefficients {cf, j = 0, . . . , 10 6 }. For each value of 
a taken from the set {2~ kl2 , k = 1 , . . . , 15}, we repeated 5000 times the following computations: 

/ set 1 N a = 50a~ V2 , 

■/ generate the noisy sequence {YfJ = Q,...,N a } by adding to {cj} an i.i.d. A/c(0, o 2 ) 
sequence {^y}, 

•/ randomly choose a parameter t* uniformly distributed in [0, 2;r], independent of {<?/}, 
•/ generate the shifted noisy sequence {Y*;j = 0,...,N a } by adding to {e yT cy} an i.i.d. 

A/"c(0, a 2 ) sequence independent of {4y} and of r*, 

y compute the three values of the test statistic A ff corresponding to the classical shrinkage 

weights defined by (13) and compare these values with the threshold for a = 5%. 

We denote by Pa™ept( ff )- Paccept( ff ) ar| d Paccept( ff ) * ne proportion of experiments (among 10 3 that 
have been realized) leaded to a value of the corresponding test statistic lower than the threshold, 
i.e., the proportion of experiments leading to the acceptance of the null hypothesis. We plotted in 
Figure 1 the (linearly interpolated) curves k i-» p P a [^ ept (Ok), k *-> pl^pMk) and k h-> p P s ^ x {Ok), 
with Ok = 2~ kl2 . It can be clearly seen that for a = 2~ 7 8 x 10~ 3 , the proportion of true 
negatives is almost equal to the nominal level 0.95. It is also worth noting that the three curves 
are quite comparable, with a significant advantage for the curve corresponding to Pinsker's 
and Tikhonov's weights: this curves converge a faster to the level 1 — a = 95% than the curve 
corresponding to the projection weights. 



4.2. Power of the test 

In the previous experiment, we illustrated the behavior of the penalized likelihood ratio test 
under the null hypothesis. The aim of the second experiment is to show what happens under the 
alternative. To this end, we still use the HeaviSine function as signal f and define f* = f + yep, 
where y is a real parameter. Two cases are considered: cp(t) = ccos(4f) and cp(t) = c/(1 + t 2 ), 

1 Thls value of N a satisfies the assumptions required by our theoretical results. 



imsart-generic ver. 2011/01/24 file: Hal_final.tex date: January 20, 2013 



O. Collier and A. DalalganjCurve registration by nonparametrlc testing 



9 



Convergence to the nominal level under Hq 




Projection shrinkage weights 

Tikhonov shrinkage weights 

— Pinsker shrinkage weights 



n = -21og 2 ((j) 



Figure 1. The proportion of true negatives in the experiment described in Section 4.1 as a function of[oq 2 o~~ 2 for 
three different shrinkage weights: projection (Left), Tikhonov (Middle) and Pinsker (Right). One can observe that for 
a = 2~ 15 ' 2 ~ 5 x 10~ 3 , the proportion of true negatives is almost egual to the nominal level 0.95. Another observation 
Is that the Pinsker and the Tikhonov weights lead to a faster convergence to the nominal significance level. 



where c Is a constant ensuring that has an L 2 norm equal to 1. For each of these two pairs of 
functions (f, f*), we repeated 5000 times the following computations: 

/ set a = 1 and N a = 50ff" 1/2 , 

•/ compute the complex Fourier coefficients {c y ; j = 0, . . . , 10 6 } and {c*;j = 0, . . . , 10 6 } of f 
and f , respectively, 

y generate the noisy sequence {YjJ = 0,...,N a } by adding to {cy} an l.l.d. A/ic(0, a 2 ) 
sequence {<?/}, 

y generate the shifted noisy sequence {Y*;j = 0, . . . , N a ] by adding to {c*} an l.l.d. 

A/"c(0, a 2 ) sequence {^*}, Independent of {<?/}, 
y compute the value of the test statistic A a corresponding to the projection weights and 

compare this value with the threshold for a = 5%. 

To show the dependence of the behavior of the test under H-\ when the distance between the null 
and the alternative varies, we computed for each y the proportion of true positives, also called 
the empirical power, among the 5000 random samples we have simulated. The results, plotted In 
Figure 2 show that even for moderately small values of y, the test succeeds In taking the correct 
decision. It Is a bit surprising that the result for the case cp(t) = ccos(4f) is better than that for 
q>(t) = c/(1 + t 2 ). Indeed, one can observe that the curve at the right panel approaches 1 much 
faster than the curve of the left panel. 
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Figure 2. The proportion of true positives in the experiment described in Section 4.2 as a function of the parameter 
y measuring the distance between the true parameter and the set of parameters characterizing the null hypothesis. 
The main observation is that both curves tend to 1 very rapidly. 

5. Conclusion 

In the present work, we provided a methodological and theoretical analysis of the curve regis- 
tration problem from a statistical standpoint based on the nonparametric goodness-of-fit testing. 
In the case where the noise Is white Gaussian and additive with a small variance, we estab- 
lished that the penalized log-llkellhood ratio (PLR) statistic is asymptotically distribution free, 
under the null hypothesis. This result Is valid for the weighted / 2 -penallzatlon under some mild 
assumptions on the weights. Furthermore, we proved that the test based on the Gaussian (or 
chl-sguared) approximation of the PLR statistic Is consistent. These results naturally carry over 
to other nonparametric models for which asymptotic egulvalence (In the Le Cam sense) with 
the Gaussian white noise has been proven. It can be Interesting, however, to develop a direct 
Inference In these models. In particular, the model of spatial Polsson processes (cf. [27]) can be 
of special Interest because of Its applications In Image analysis. 

Some Important Issues closely related to the present work have not been treated here and 
will be done In near future. Perhaps the most Important one is to determine the minlmax rate of 
separation of the null hypothesis from the alternative. The results we have shown tell us that 
this rate Is not slower than a 1/2 (log ) 1/4 . However, It Is very likely that this latter rate is 
suboptimal. There is a large body of literature on the topic of minlmax rates of separation (cf. the 
book by Ingster and Susllna [28] and the references therein), but they mainly concentrate on the 
case of a simple null hypothesis. We expect that the composite character of the null hypothesis In 
our set-up will slow down the rate of convergence at least by a logarithmic factor. The adaptive 
choice of the tuning parameter N a Is another central Issue that has not been answered In the 
present paper. We envisage to tackle this Issue In a future work. 

Appendix A: Proofs of the theorems 

The proof of Wilks' phenomenon Is divided Into several parts. First we assume that Ho Is true 
and study the convergence of the pseudo-estimator r (of the shift f*) defined as the maxlmlzer 
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of the log-Ukellhood over the interval [t* — n, t * + jt\ Here, t* is an element of [0, 2tt[ such that 
Cj = e- { i r cf, for ally > 1. 

A.1. Maximizer of the log-likelihood 

Proposition 1. Let Ho be satisfied, c £ T\x ond \c\\ > 0. If the shrinkage weights Vj satisfy 
conditions (A) and (B), then the solution f to the optimization problem 

t = arg max M(t), with M(r) = Y~ vj Re{e ijT YfYf) 

t:\t- t*\<7T — - 

;>1 

satisfies the asymptotic relation 

\t-t*\ = o^/loqN a ^ + oNl t2 )0 P {\), as a -> 0. 
Proof of Proposition 1. If we set rjj = e _yf ey and n* = cj, we can write the decomposition 

M(t) = E[M(t)] + aS(r) + ff 2 D(r + f*), 

where 

E[M(T)] = ^v,|c/cos[y(r-r*)]. 

D(r) = ^ 1 / y Re(e l ^ y ^). 

On the one hand, using the assumption |c-|| > along with condition (A), we get that 

E[M(r)] - E[M(t*)] i-cos(T-n 2| Cl | A 
(T^p -" Ulkl1 (r-f*)2 -~— = C< °- 

Therefore, 

M(t) - M(t*) = E[M(t)] - E[M(f*)] + o[S(t) - S(t*)] + ff 2 [D(r) - D(f*)] 
< _ C | T - t*| 2 + o\t - f*| • HS'lloo + o 2 \r - r*| ■ HD'IU 
= |r-f*|{a||S'|| 00 + ( T 2 ||D / || 00 -C|T-t*|}. 

Using this result, for every a > 0, we get 

P(|t- f*| > a) < Pj sup M(t) - M(t*) > o} 

<P{ sup [o-||S'||oo + cr 2 ||0'||oo - C|r - r*|] > o} 

t-t*|>o 

< P{ff||S'[| oo + 2 ||D'|| oo > Co}. 
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Choosing a = a^/ioq N a (2 + a/V;p)z, we get 

P(|t-t*| > a^[^N~ a ^ +aN 3 J 2 )z) < P( \\S'U > 2Cz^T^N~ a ) + P( IjO'l^ > Cz^ log /V ff ). 
On the other hand, since 

S'(f) = ^/|c> y Re(e l %), 

where 4y are i.i.d. complex valued random variable, whose real and imaginary parts are inde- 
pendent A/"(0, 2) variables, the large deviations of the sup-norm of S' can be controlled by using 
the following lemma. 

Lemma 1. The sup-norm of the function 5(t) = Y-f=o s j{ C0S (j T )^j + sm (/fK/}' where {<*y} and 
{<5'j} are two independent sequences of i.i.d. A/"(0, 1) random variables, satisfies 

P(||S||oo > ||s|| 2 x) <(K + 1)e"* 2/2 , Vx > 0. 

Proof. This results is a direct conseguence of Berman's ineguality that we recall in Section B 
for the reader's convenience. □ 

Using this lemma and the fact that N„ > 2, we get that P(||S'||oo > 2/_C v / 2y log N a ) < 
2N a y < 2 2_y for every y > 1. Finally, the large deviations of the term ||D'||oo are controlled by 
using Lemma 3 below. Putting these inegualities together, we find that for any a G (0,1), there 
exists z > such that 

P(|t - t*\ > oy/logN^^ + oN] t2 )z) < a. 
In conclusion, we get that t— t* is, in probability, at most on the order ff-^/log /V CT (1 + aN^ 2 ). □ 



A.2. Proof of Theorem 1 



One can check that, under Ho, 

+ 0O 



1 



mm 



(T z re[0,2;r[ 



T. v i\ Y i 

7=1 



e -iyry#| 



1 

— ~ min 

O \t\<7T 



in {D a (T) + 2C a (T) + P a (T)}, (16) 



where we have used the notation: 

+oo 

D ff (r) = ^i/ y |c y [ 2 |l - e - [ ^- f *» 
7=1 

+oo 



Q(r) = v y Re [c y (l - e"^-^) ( Cy - e-^e?)], 

7=1 

+oo 

P ff (r) = a 2 ^v y |e y -e-^; 12 
7=1 



(deterministic term) 
(cross term) 
(principal term) 



(Since Ho is assumed satisfied, there exists t* G [0, 2n[ such that c y = e~ l < T *c* for all j > 1.) 
We denote by t the pseudo-estimator of f* defined as the minimizer of the RHS of (16) and 
study the asymptotic behavior of the terms D„, C a and P a separately. 
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•/ For the deterministic term, It holds that 

+00 +00 
|O ff (r)| < ^j 2 Vj\cj\ 2 (T - r*) 2 <(t- t*) 2 ^j 2 \ Cj \ 2 < L(f - t*) 2 
7=1 7=1 
= {a 2 (1 +a 2 N 3 a )[oq N ff }O p (1). 

s/ Let us turn now to the cross term. It holds that: 

+00 

C„(t) = aY_ vy{(1 - cos[/(t - f*)]) Re [c y (e y - e-^'e*)] + sin[/(t* - t)] Im [c y (e y + e-^e y *)]}. 

7=1 

Thus, as C a (t*) = 0, we have 

\c a (r)\ < \t - 1*\ ■ wou 

By arguments similar to those used in the proof of Proposition 1, we check that IjC^Hoo is 
on the order {cr-^/log N a } In probability. Therefore, It holds that 

|C ff (t)| = {<x 2 (1 +oN 3 J 2 )log N ff }O p (1) 

/ Let us now study the last term, P a (r) = o 2 ^^^ Vj\ej — e~^ T ej\ 2 , which will determine 
the asymptotic behavior of the test statistic. Now denoting t]j = e iyT *e y and q* = e*, we 
can rewrite this term as P ff (r) = a 2 v j\ r lj '~ e~ l ^ T ~ T *^q* | 2 . We wish to prove that 
under H , If conditions (A), (B), Na -h> +00 and o 2 N% 2 log(/V ff ) = o P (1) are fulfilled, then 

To check this property, we decompose the principal term as follows: 



T a (T) = T a \T ) + ; 



4* 2 (L*>1 * 

" s ^^^^^^^*fc. ^^^^ 

y We start by witting T a (t*) as 

T a (l*) = Y_X iitt , with Xj, a = ^ 



y=t 4 (L^>1 ^) 

and applying the Berry-Esseen Inequality [35, Theorem 5.4], which Is possible since the 
Xj ttJ 's are Independent random variables with mean and finite third moment. Furthermore, 

we have B a = £^ Vor(X y> ) = 1 and L ff = Sj* E|X y> | 3 < C/V^. Therefore, the 

Berry-Esseen Inequality yields 

sup|F ff (x)-<D(x)| < KL a , 

x 

— - N 

where F a (x) = P(B a 2 XIy=i ^> < x )' * LS ^ ne c t ^ ^ °f * ne standard Gaussian distribution 
and /C Is an absolute constant. Hence 



imsart-generic ver. 2011/01/24 file: Hal_final.tex date: January 20, 2013 



O. Collier and A. DalalyanjCurve registration by nonparametric testing 14 

•/ It remains now to prove that R a tends to in probability, which — in view of Slutski's 
lemma — will be sufficient for completing the proof. It holds that 

+00 N a . , _ „. , 

W = L " + L V Re^(e^-^ - 1) = Y_ \ ] Re (e^), 

y=1 2(L/=1 vf) 2 y=i 2(Ly = i 

with f some real number between t and r*. Then, by virtue of Lemma 3, 

It-t*I i N ° - 

\Ro(t)\ < - ' +OQ k sup ^yV y Re(e^ y /7*) = + aNf )N,log N ff } • P (1). 



1 



Hence, /? ff (t) = op(1) and the desired result follows. 
A3. Proof of Theorem 2 

Let us study the test statistic T a = (A a (Y'' # ) - 4||i/||i)/4[|v[| 2 , and show that it tends to +00 in 
probability under H\. Actually, the hypothesis H-\ will be supposed to be satisfied throughout 
this section. It holds true that: 

^ +00 

AJV'*) = ^ min Y~ V j\ Y, - e~ x ' T Yf | 2 
a 2 TG{o,2x]j-^ ' 1 1 

= .1 min Y~ vjUcj - e-'^c*) + o(e, - e'^ef )f 

O l tg[0,2tt] 1 1 

> 4y min f V" v/lc; — e _VT cf | 2 1 — — max | V~ v,|ci - e _yT c?| • le; - e _yT efll-- 

ff 2 tg[0,2^] I J^J" J ff tg[0,2^] I J 

Let us focus on the first term. Denoting 5 a = min{y > 1, Vj < c}, we get by condition (C) that 
5 a — > +00, which implies 

min Y~ v:\ci — e -y ' T cf I 2 > c min Y~ |c; - e _yT cf| 2 

tg[0,2tt] ^7 7 ^[0,2^] 7 

+00 

>c( min Y~ In - e'^cfl 2 -4/_<5" 2 ) 

\ tg[0,2^] <— 1 1 I 

>c{p-AL5~ 2 ). 

Now, the second term satisfies 



J- V .| C7 _ e-^c y # | • |e y - e"^;| < max (|e y | V |ef |)^ (|c y | + |cf |) 

;>1 1 ' y>i 

= P (^c7N~ a )^(\cj\ + \cf\ ) ) 

y>i 

/ ^ v 1/2 I N a \ 2 \ 1/2 

<o P (V^g^)[^y- 2 ) ^y 2 (ky| + |cf|) ) 



y>i ' v y>i 
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Putting It all together, we get 

A a (Y'-*) - 4\\vh cp - 4/.c<5- 2 + Op ( a V log /V ff ) p 

T a = 77, — n > , > +00. 

4||v|| 2 " 4o 2 ^fW a 

Appendix B: Bounds for the maxima of random sums 

In this section, we will give some technical lemmas which will be useful In the proofs of this 
paper. We are interested in bounding the maximum of the sum of a growing number of terms, so 
that the non-asymptotic result given In [3] will be useful: 

Proposition 2 (Berman [3]). Suppose that gj are continuously differentiable functions satisfying 
9j(t) 2 = 1 f° r °" x - ar, d $j ~ A/"(0, 1). Then, for every x > 0, we have 



I " \ L 2 f + °° e~T C b [ " 11/2 

p[^ gjm > X )<-e-, + l ^=dt, with Lq = ^ g'j 



(tf 



dt. 



We will also use the following fact about moderate deviations of the random variables that 
can be written as the sum of sguares of Independent centered Gaussian random variables. 

Lemma 2. Let N be some positive integer and let qj, j = 1 , . . . , N be independent complex 
valued random variables such that their real and imaginary parts are independent standard 
Gaussian variables. Let s = (si , . . . , s/v) be a vector of real numbers. For any y > 0, it holds 
that 

N 

P{ E^/l 2 > 2Nli + 2^2||s|| 2 y + 2\\s\\ly 2 } < e~^, 



/=1 



N i I, 



with the standard notations lis Hon = max |s,| and \\s\\i = • 1 Is, 

y=l n 1 4 J -' ' 

Proof. This Is a direct conseguence of [31, Lemma 1]. □ 

Lemma 3. Let N be some positive integer and let qj, qj, j = 1 , . . . , N be independent com- 
plex valued random variables such that their real and imaginary parts are independent stan- 
dard Gaussian variables. Let s = (s-\ , . . . , s/v) be a vector of real numbers. Denote S(t) = 
L>=1 s j Re [^''ijOj) for every t in [0,2n] and |]S|]oo = sup fG[0 27r] \S(t)\. Then, 



P 



{||S||oo > V^x(||s|| 2 + (/||s||oo)} < (/V + 1)e" x2/2 + e-y 2/2 , Vx, y > 0. 



Proof. First note that we can not directly use Berman's formula, since the summands are not 
Gaussian. However, they are conditionally Gaussian If the conditioning is done, for example, 
with respect to the seguence (qj). Indeed, 

N N 

J~ sj Re (e^qjqj) | (qj) ~ J~ Sj \ n j\{ cos(/t)$ - sln(yr)^) with % ! d tf(0, 1). 

/=1 /=1 
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It follows by Lemma 1 that 



P sup ^ sj Re ( e^njn*)] > * ( ^ s y 2 |^| 2 ) 2 1 (/?*) < (N + 1 ) exp ( - - ) . 

\ [0,2jr] - =1 y =1 / 

Let us now denote by £ the square root of the random variable ^y = i sjl'?/! 2 - It LS clear that for 
all a > 0, 

P(||S||oo > ax) = PdlSfoo > ox; < < o) + P(||S||oo > ox; (> o) 
<P(||S|| oo >x0 + PK>o) 
< (/V + 1)e" x2/2 + PK> o). 



To complete the proof, it suffices to replace a by \/2(||s||2 + y ||s||oo) ar >d to apply Lemma 2 along 
with the inequalities ||s||2 + ||s||ooy = {\\sW2 + 2||s||oo||s||2y + ll^H^y 2 ) 1 ' 2 > (||s|| 2 + v^sl^y + 
llsllLy 2 ) 1 / 2 . □ 
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